Skip to content

RISC-V ASM unaligned read/writes: alternative assembly#10530

Open
SparkiDev wants to merge 2 commits into
wolfSSL:masterfrom
SparkiDev:riscv_unaligned_fix
Open

RISC-V ASM unaligned read/writes: alternative assembly#10530
SparkiDev wants to merge 2 commits into
wolfSSL:masterfrom
SparkiDev:riscv_unaligned_fix

Conversation

@SparkiDev
Copy link
Copy Markdown
Contributor

Description

Not all RISC-V chips allow unaligned reads and writes with basic assembly instructions like lw/sw.
Add alternative assembly that is turned on with:
WOLFSSL_RISCV_ASM_NO_UNALIGNED.

Fixes #10525

Testing

./configure --disable-shared LDFLAGS=--static --host=riscv64 CC=riscv64-linux-gnu-gcc --enable-riscv-asm CFLAGS=-DWOLFSSL_RISCV_ASM_NO_UNALIGNED

@SparkiDev
Copy link
Copy Markdown
Contributor Author

Jenkins: retest this please

@SparkiDev SparkiDev force-pushed the riscv_unaligned_fix branch from 3b8e981 to b4611fa Compare May 26, 2026 04:19
@EAlexJ
Copy link
Copy Markdown

EAlexJ commented May 26, 2026

I would like to push back on this implementation a bit.

Currently, all instructions that could be unaligned get emulated.

I think it makes sense to check if the pointers are actually unaligned and only then use the more costly emulation.
I suggest something like this (with the check guarded behind WOLFSSL_RISCV_ASM_NO_UNALIGNED):

Sha512Transform(wc_Sha512* sha512, const byte* data,
    word32 blocks)
{
if((uintptr_t) data % 8) // modulo 8 as the routine uses UNALIGNED_[LS]D
    <emulate using the UNALIGNED_* macro>
else
    <normal execution>
}

There should only be a negligible impact on performance in case the data is aligned. This also only introduces a small size overhead, the check should be a few instructions at most.

@EAlexJ
Copy link
Copy Markdown

EAlexJ commented May 26, 2026

On a separate issue: In my opinion readability would be increased if you just used the UNALIGNED_* Macro unconditionally and only check once in the actual definition of the macro if WOLFSSL_RISCV_ASM_NO_UNALIGNED is defined and then emit the emulation.
This is then similar to how REV8 and others are handled. It would then also make sense to name the macro differently.

EDIT: This is probably obsolete in case the suggestion above gets adopted

@SparkiDev SparkiDev force-pushed the riscv_unaligned_fix branch from b4611fa to bd46955 Compare May 26, 2026 23:05
Not all RISC-V chips allow unaligned reads and writes with basic
assembly instructions like lw/sw.
Add alternative assembly that is turned on with:
WOLFSSL_RISCV_ASM_NO_UNALIGNED.
@SparkiDev SparkiDev force-pushed the riscv_unaligned_fix branch from bd46955 to 26c45cf Compare May 27, 2026 00:21
Copy link
Copy Markdown

@EAlexJ EAlexJ May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed some in

int wc_AesSetKey(Aes* aes, const byte* key, word32 keyLen, const byte* iv,
    int dir)
{

The pointer to key is not necessarily aligned (Found it out the hard way)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying then that none of the fields of Aes will be aligned?
If so, then I will need to change the access to the fields: key, reg and tmp.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That I do not know, the best solution for me is to align the buffers at its source, so I went into the ssl struct and corrected the placement of byte* key.
This was sufficient, so in this very case only key seems to not be aligned by default

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why did you choose to use ALIGN16, is ALIGN8 not already sufficient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they are 16 byte buffers in AES, I made it align on 16 bytes.
Looking into it, the vector instructions are 64 bit loads and stores so changing to ALIGN8 will be fine.

@SparkiDev
Copy link
Copy Markdown
Contributor Author

I've modified the macros to check the alignment and choose the sequence to use.
Let me know if that is better.

Thanks,
Sean

Copy link
Copy Markdown

@EAlexJ EAlexJ May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following pattern several times:

addi    t, p, o          
andi    t, t, <alignment mask>           
bnez    t, <label>
----
t: scratch register
p: register holding the base address
o: offset

This checks the alignment of p+o, but since o is always a valid offset (is it not?) it is sufficient to check p only.
One instruction can then be saved by doing:

andi    t, p, <alignment mask>           
bnez    t, <label>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I see is that there is a lot of double checking for alignments:
The bulk variants call the underlying N times, but since the check for the alignment is done in the underlying only, it also gets checked N times.

In case the data is actually aligned there should only be one check.

I'd argue that most of the time the data is (or at the very least should be) aligned, so I'd try to keep the overhead for this case as small as possible.

If the data is unaligned, performance will take a hit even on hardware that supports misaligned loads and stores, e.g. when accesses cross cache-line boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Several Issues with respect to alignment for RISC-V assembly routines

2 participants